Cap and coalesce terminal event streams to reduce memory growth#376
Merged
Conversation
The terminal and worktree-info event streams used unbounded AsyncStream buffers, so a producer outpacing the main-actor consumer (e.g. a per-tab projection / progress / task-status storm) could grow the in-process buffer without bound and steadily increase memory use over a long session. Switch both streams to bufferingNewest(2048). Coalesce the latest-wins terminal state events (tab projection, progress, task status, focus) by identity so a storm collapses to its last value, dropping only a value equal to the immediately-previous one per key. Seed the coalesce cache from the resubscribe replay (both the pending-buffer drain and the projection re-seed run through emit) so the first live event after a resubscribe is deduped consistently. Bound and coalesce the pre-subscription pending buffer, mirror the teardown purge into it, and clear lastEmittedProjections alongside the coalesce keys on prune. On a backpressure drop, log a compact identity of the shed event (case plus key ids, never the payload) so a drop storm can't flood the log. The worktree-info stream is capped but not deduped: its events are refresh signals where each repeat is meaningful. Surface a non-cancellation error from the per-worktree refresh loop instead of breaking silently. Add tests for the coalescing, the live and pending buffer caps, the teardown purge, lifecycle events never coalescing, the pending-replay cache seeding, and the worktree-info buffer cap.
drain() ran Task.megaYield(count: 10_000) up to 64 times per call, and each megaYield spawns `count` detached tasks, so a single drain churned hundreds of thousands of tasks; the settle check also never early-returned. Socket-presence tests that drain several times took 13 to 43 seconds each despite a TestClock. Lower the per-pass yield count and settle on a genuinely quiescent pass (consumer parked, nothing processed, and no idle-hook debounce still scheduled via a test-only count on the manager). The last clause closes the race where clock.advance returned but the awoken idle task had not yet emitted, which would otherwise let a busy suite conclude "idle" too early. The six socket tests drop from 13 to 43 seconds to between 0.06 and 1.5 seconds with no behavior change.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The terminal and worktree-info event streams used unbounded
AsyncStreambuffers. When a producer outpaces the main-actor consumer (for example a per-tab projection / progress / task-status storm), the in-process buffer could grow without bound and steadily increase memory use over a long session. This bounds and coalesces those streams, and removes the redundant churn at the source.Event streams
.bufferingNewest(2048)instead of an unbounded buffer, so a wedged consumer caps memory instead of growing forever.Test harness
drain(), which spawned hundreds of thousands of detached tasks per call and never early-returned. The slowest socket-presence tests drop from 13 to 43 seconds to under 1.5 seconds each, with no behavior change.Tests